Skip to main content

View Post [edit]

Poster: aronsson Date: Jan 16, 2021 7:08am
Forum: texts Subject: Re: Wrong language stated for this newspaper

Thanks, great!

The OCR (full text) looks rather good, but all Swedish ÅÄÖ have become AAO, so perhaps OCR was not done with language=Swedish? If OCR is rerun now, will it use any new algorithms for detection and separation of newspaper columns? All that is mentioned in the metadata is "Ocr - ABBYY FineReader 11.0 (Extended OCR)", but it doesn't mention which settings were used for the OCR.

Since these items are named "Dagens Industri - March 13, 2015", sorting by title puts March 2015 before November 2014, and sorting by date (added in 2019) doesn't help. However, each item has an "Identifier - dagensindustri-2014-10-09" and sorting by that would make perfect sense, but the collection Newspapers has no option to sort by identifier.

Reply [edit]

Poster: Jeff Kaplan Date: Jan 16, 2021 9:36am
Forum: texts Subject: Re: Wrong language stated for this newspaper

https://archive.org/details/dagens-industri?&sort=date

re-processing with language=Swedish. check the text in a day or so. This post was modified by Jeff Kaplan on 2021-01-16 17:36:46